AITopics | short context

Collaborating Authors

short context

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Induced Model Matching: Restricted Models Help Train Full-Featured Models

Neural Information Processing SystemsFeb-15-2026, 20:28:10 GMT

This restricted model may be thought of as "side-information", and can come

artificial intelligence, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Short-Context Dominance: How Much Local Context Natural Language Actually Needs?

Vakilian, Vala, Wang, Zimeng, Rawat, Ankit Singh, Thrampoulidis, Christos

arXiv.org Artificial IntelligenceDec-10-2025

We investigate the short-context dominance hypothesis: that for most sequences, a small local prefix suffices to predict their next tokens. Using large language models as statistical oracles, we measure the minimum context length (MCL) needed to reproduce accurate full-context predictions across datasets with sequences of varying lengths. For sequences with 1-7k tokens from long-context documents, we consistently find that 75-80% require only the last 96 tokens at most. Given the dominance of short-context tokens, we then ask whether it is possible to detect challenging long-context sequences for which a short local prefix does not suffice for prediction. We introduce a practical proxy to MCL, called Distributionally Aware MCL (DaMCL), that does not require knowledge of the actual next-token and is compatible with sampling strategies beyond greedy decoding. Our experiments validate that simple thresholding of the metric defining DaMCL achieves high performance in detecting long vs. short context sequences. Finally, to counter the bias that short-context dominance induces in LLM output distributions, we develop an intuitive decoding algorithm that leverages our detector to identify and boost tokens that are long-range-relevant. Across Q&A tasks and model architectures, we confirm that mitigating the bias improves performance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.08082

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data

Deng, Haoran, Lin, Yingyu, Lin, Zhenghao, Liu, Xiao, Sun, Yizhou, Ma, Yi-An, Gong, Yeyun

arXiv.org Artificial IntelligenceOct-31-2025

Long-context language models unlock advanced capabilities in reasoning, code generation, and document summarization by leveraging dependencies across extended spans of text. However, a significant portion of readily available long-text data lacks meaningful long-distance dependencies; most spans can be predicted using only local context. Training on such data is inefficient, making careful data selection crucial. Therefore, we introduce LongFilter, a framework for curating training data tailored to long-context pretraining. LongFilter measures the information gain provided by extended context by contrasting model predictions under long-context versus short-context settings, thereby identifying samples where long-range dependencies are essential. Experiments with LLaMA-3-8B, extending its context length from 8K to 64K, show that LongFilter efficiently selects high-quality data and yields substantial improvements on benchmarks such as HELMET, LongBench, and RULER.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.25804

Country: North America > United States > California (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Induced Model Matching: Restricted Models Help Train Full-Featured Models

Neural Information Processing SystemsOct-10-2025, 06:05:33 GMT

This restricted model may be thought of as "side-information", and can come

experiment, imm, objective, (10 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Are Long-LLMs A Necessity For Long-Context Tasks?

Qian, Hongjin, Liu, Zheng, Zhang, Peitian, Mao, Kelong, Zhou, Yujia, Chen, Xu, Dou, Zhicheng

arXiv.org Artificial IntelligenceMay-24-2024

The learning and deployment of long-LLMs remains a challenging problem despite recent progresses. In this work, we argue that the long-LLMs are not a necessity to solve long-context tasks, as common long-context tasks are short-context solvable, i.e. they can be solved by purely working with oracle short-contexts within the long-context tasks' inputs. On top of this argument, we propose a framework called LC-Boost (Long-Context Bootstrapper), which enables a short-LLM to address the long-context tasks in a bootstrapping manner. In our framework, the short-LLM prompts itself to reason for two critical decisions: 1) how to access to the appropriate part of context within the input, 2) how to make effective use of the accessed context. By adaptively accessing and utilizing the context based on the presented tasks, LC-Boost can serve as a general framework to handle diversified long-context processing problems. We comprehensively evaluate different types of tasks from popular long-context benchmarks, where LC-Boost is able to achieve a substantially improved performance with a much smaller consumption of resource.

lc-boost, long context, short context, (16 more...)

arXiv.org Artificial Intelligence

2405.15318

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > Dominican Republic (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre:

Research Report (0.64)
Personal (0.46)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Induced Model Matching: How Restricted Models Can Help Larger Ones

Muneeb, Usama, Ohannessian, Mesrob I.

arXiv.org Artificial IntelligenceFeb-19-2024

We consider scenarios where a very accurate predictive model using restricted features is available at the time of training of a larger, full-featured, model. This restricted model may be thought of as "side-information", derived either from an auxiliary exhaustive dataset or on the same dataset, by forcing the restriction. How can the restricted model be useful to the full model? We propose an approach for transferring the knowledge of the restricted model to the full model, by aligning the full model's context-restricted performance with that of the restricted model's. We call this methodology Induced Model Matching (IMM) and first illustrate its general applicability by using logistic regression as a toy example. We then explore IMM's use in language modeling, the application that initially inspired it, and where it offers an explicit foundation in contrast to the implicit use of restricted models in techniques such as noising. We demonstrate the methodology on both LSTM and transformer full models, using $N$-grams as restricted models. To further illustrate the potential of the principle whenever it is much cheaper to collect restricted rather than full information, we conclude with a simple RL example where POMDP policies can improve learned MDP policies via IMM.

experiment, imm, prediction, (11 more...)

arXiv.org Artificial Intelligence

2402.12513

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre:

Research Report > New Finding (0.49)
Research Report > Experimental Study (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback